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I.  INTRODUCTION 


Coop lex  computer  simulation  models  (computer  codes)  usually  must  be 
studied  by  performing  a  number  of  simulation  runs.  That  is,  we  must 
"experiment"  with  the  simulation  model.  Standard  experimental  designs, 
however,  can  very  easily  require  more  computer  runs  than  are  reasonable 
or  affordable,  especially  when  many  factors  (l.e.,  input  variables)  are 
present.  In  such  cases  it  may  be  beneficial  to  Invest  a  relatively  small 
number  of  runs  in  a  preliminary  experiment  aimed  at  determining  which 
factors  are  the  most  Important.  By  screening  out  those  factors  which  appear 
to  be  relatively  unimportant,  we  can  concentrate  the  major  experimental 
effort  and  expense  on  the  Important  factors.  The  smaller  the  proportion  of 
important  effects,  the  more  it  is  to  our  advantage  to  conduct  a  screening 
experiment. 

Because  running  a  large,  complex  simulation  can  be  costly  and  time-con¬ 
suming,  the  number  of  runs  available  for  screening  is  generally  severely 
limited.  Typically,  a  supersaturated  situation  exists.  That  is,  the  number 
of  factors  to  be  studied  exceeds  the  number  of  available  screening  runs.  Al¬ 
though  a  number  of  screening  strategies  satisfying  this  constraint  have  been 
proposed  (see,  for  example,  Kleijnen  [4],  Srivastava  [9],  end  Smith  and 
Mauro  [8]),  no  definite  guidelines  for  selecting  a  screening  strategy  have 
been  established,  nor  have  the  available  methods  been  evaluated  and  compared 
systematically. 

In  this  report  we  evaluate  and  discuss  a  two-stage  screening  procedure 

which  we  refer  to  as  an  RP  strategy.  The  first  stage  of  this  strategy  is  a 

K  factor  random  balance  (RB)  experiment  where  K  denotes  the  total  number  of 

factors  to  be  screened  (see  Satterthwaite  [7]  and  Budne  [1]);  the  second 
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stage,  a  follow-up  to  the  first  stage.  Is  a  Plackett-Burman  (PB)  experiment 
(see  Plackett  and  Burman  [6]).  A  factor  Is  Included  In  the  second-stage  ex¬ 
periment  only  If  It  Is  determined  to  have  a  significant  effect  In  the  first- 
stage  experiment. 

In  a  previous  Desmatics,  Inc.  technical  report,  Mauro  and  Smith  [5] 
formally  Investigated  and  reported  the  performance  of  the  BP  class  of 
strategies  in  the  case  of  zero  error  variance  (i.e.,  where  the  simulation 
response  is  observed  without  random  error) .  Our  purpose  here  is  to  extend 
the  performance  analysis  to  the  general  case  of  nonzero  error  variance. 

As  a  statistical  basis  to  assess  this  strategy,  we  assume  the  following 
model: 

*1*  Bo+*1ejxu+ei  (1-1) 

where: 

(1)  y^  is  the  value  of  the  response (i.e. ,  output  variable)  in  the 
1—  simulation  run, 

(2)  K  is  the  total  number  of  factors  to  be  screened,  each  of  which 
is  at  two  levels  coded  +1  and  -1, 

(3)  x^j  is  +1  or  -1  depending  on  the  level  of  the  j—  factor  during 
the  i—  simulation  run, 

(4)  8q  is  a  component  common  to  all  responses,  and  0^(J^1)  is  the 
linear  effect  of  the  J—  factor, 

and  (3)  the  error  terms,  c^,  are  independent,  have  common  distribution 
2 

N(0,  0  ),  and  are  independent  of  every  design  variable  (x^,)  in 
the  model* 

For  detecting  the  factors  having  major  effects,  model  (1.1)  is  gener- 
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ally  sufficient.  In  essence*  It  Is  a  first-order  Taylor  series  approxi¬ 
mation  to  an  actual  relationship  between  output  and  input  variables.  Per¬ 
formance  evaluation  will  be  restricted  to  this  model. 

The  basic  function  of  a  screening  strategy  is  to  sort  all  the  factors 
into  essentially  two  groups:  a  group  containing  the  so-called  Important 
factors  and  a  group  containing  the  so-called  unimportant  factors.  The 
factors  classified  as  Important  can  then  be  subjected  to  more  detailed 
study  in  subsequent  experimentation.  In  an  RP  strategy*  a  factor  is  clas¬ 
sified  as  important  only  if  it  reaches  the  second-stage  experiment  and  is 
subsequently  determined  to  have  a  significant  effect.  All  other  factors 
are  classified  as  unimportant. 

The  actual  importance  (or  unimportance)  of  a  factor*  however*  will 
more  generally  depend  on  the  magnitude  of  its  effect  relative  to  that  of 
experimental  error,  o*  and  to  the  magnitudes  of  the  other  effects  present. 
From  a  practical  standpoint*  the  greater  (lesser)  the  degree  of  importance* 
the  larger  should  be  the  probability  of  classifying  the  factor  as  Important 
(unimportant).  In  this  report,  we  summarize  performance  in  terms  of  a 
strategy's  sensitivity  (i.e.*  power)  for  declaring  a  factor  important  and 
in  terms  of  the  expected  number  of  runs  the  strategy  requires. 


II.  PRELIMINARY  DISCUSSION 


In  a  two- level  (±1)  RB  design,  each  column  of  the  design  matrix  con¬ 
sists  of  N/2  +1  * s  and  N/2  -l's  where  N  (an  even  number)  denotes  the  total 
number  of  runs.  The  allocation  of  +1 's  and  -l's  to  each  column  Is  made 

randomly  so  that  all  possible  configurations  of  N/2  +l's  and  N/2  -l's 
N 

(there  are  In  all)  are  equally  likely,  with  each  column  receiving  an 

Independent  allocation.  The  condition  that  every  column  of  the  design  ma¬ 
trix  has  an  equal  number  of  runs  at  the  high  (+1)  and  low  (-1)  levels  as¬ 
sures  us  that  estimates  of  the  individual  factor  effects  are  unconfounded 
with  the  overall  mean  effect,  which  can  be  represented  by  a  column  of  N 
+l’s. 

RB  designs  are  attractive  since  they  are  easy  to  prepare  for  any  N 
and  K.  Furthermore,  unlike  more  orthodox  designs  where  some  mathematical 
relationship  usually  exists  between  N  and  K,  we  can  select  N  Independently 
of  K  In  an  RB  design.  This  affords  us  a  great  deal  of  flexibility  and  sim¬ 
plification,  particularly  If  K  is  large.  Of  course,  a  RB  design  also  has 
its  corresponding  disadvantages.  Indeed,  the  main  objection  to  such  designs 
is  that  they  confound  factors  to  a  random  degree  and  have  no  specialized 
or  unique  method  of  analysis.  (For  a  more  complete  discussion  of  the  pros 
and  cons  of  random  designs,  the  reader  may  consult  Technometrics ,  Vol.  1, 

No.  2,  May  1959.)  The  results  of  the  present  report  may  help  to  resolve 
some  of  the  controversy  surrounding  random  balance  experimentation. 

Because  of  the  independent  allocation  of  factor  levels  to  each  design 
column,  practically  any  technique  uaed  to  analyze  data  without  random  bal¬ 
ance  properties  can  be  used  to  analyze  any  (sufficiently  small)  subset  of 
factors  in  a  random  balance  design.  This  is  accomplished  by  ignoring  any 


factor  not  included  in  the  particular  subset  being  analyzed.  The  effects 
of  the  Ignored  factors,  however,  will  be  absorbed  into  the  error  component 
of  the  model.  In  the  simplest  case,  we  can  consider  each  factor  separately 
and  apply  some  standard  statistical  analysis.  In  this  report,  we  consider 
a  standard  F-test  applied  separately  to  each  factor  as  the  method  of  analysis 
for  random  balance  data,  and,  for  simplicity,  conduct  each  F-test  at  the 
same  level  of  significance,  a^. 

Factors  having  a  significant  F-ratio  in  the  first  stage  are  carried 
over  to  the  second  stage  to  be  tested  in  a  PB  follow-up  experiment.  Because 
PB  designs  are  orthogonal,  the  second  stage  separates  any  confounding  be¬ 
tween  the  factors  carried  over  from  the  RB  first  stage.  Factors  not  for¬ 
mally  included  in  the  second-stage  experiment  should  be  held  at  a  constant 
level  so  not  to  bias  any  of  the  second-stage  estimates.  The  results  of  the 
second  stage  can  be  analyzed  by  the  usual  analysis  of  variance  procedures 
for  factorial  experiments. 

Although  we  can  specify  the  number  of  first-stage  runs  N,  the  number 
of  second-stage  runs  M  will  depend  on  the  number  of  factors  S  carried  over 
from  the  first  stage.  For  reasons  of  economy,  we  generally  employ  the 
smallest  PB  design  having  at  least  S+l  runs.  However,  to  avoid  possible 
saturation  and  thus  no  degrees  of  freedom  to  estimate  experimental  error, 
the  convention  we  will  follow  here  is  to  employ  the  smallest  PB  design  that 
guarantees  at  least  one  error  degree  of  freedom.  Since  PB  designs  are  only 
available  for  numbers  of  runs  that  are  multiples  of  four,  we  can  obtain  a 
minimum  of  one  and  a  maximum  of  four  error  degrees  of  freedom  by  following 
this  procedure. 

He  can  express  M  mathematically  as  M*B(S+1)  where 

B(x)  «x  +  4 -x(mod  4)  .  (2.1) 
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A  useful  approximation  to  B(x)  is  given  by 

B(x)  *  x  +  2.5  .  (2.2) 

The  total  number  of  runs  R  required  by  an  RP  strategy,  therefore,  will  be 
N  +  M-N  +  B(S+1) .  Note  that  since  M  is  random,  so  is  R.  Using  (2.2)  we 
can  approximate  E(R)  by 

E(R)  «N  +  E(S) +3.5  .  (2.3) 

Since  |B(x)  -  (x+2.5) | 1.5,  the  approximation  in  (2.3)  can  differ  from  E(R) 
by  at  most  1.5. 

In  summary,  an  RP  strategy  is  completely  determined  by  the  number  of 
first  stage  runs,  N,  and  the  significance  levels  of  the  first  and  second 
stage  tests,  and  ,  respectively.  (We  assume  the  same  level  of  signifi¬ 
cance  is  used  in  all  second-stage  testing.)  We  denote  such  a  strategy  by 
RP(N,  a^»  c^).  The  parameters  N,  o^,  and  are  at  our  disposal  in  select¬ 
ing  and  specifying  an  RP  screening  strategy.  In  Section  III  we  show  how 
these  quantities  affect  the  performance  of  an  RP  screening  plan. 


III.  PERFORMANCE  EVALUATION 


I 

l 


In  this  section  we  examine  the  performance  of  an  RP(N,  a^»  02)  strategy. 

We  discuss  the  first  and  second  stages  individually,  followed  by  considera¬ 
tion  of  the  combined  stages.  Of  primary  interest  is  the  probability  that  a 
given  factor  is  declared  important  by  an  RP  strategy,  or  equivalently  the 
probability  that  the  factor  tests  significant  in  both  the  first  and  second 
stages.  In  general,  this  probability  is  too  complex  to  be  evaluated  analyt¬ 
ically.  In  lieu  of  an  analytic  solution,  we  develop  an  approximation  to  this  ! 

probability.  We  also  discuss  an  approximation  to  the  expected  value  of  R,  j 

1 

I 

the  total  number  of  runs  required  by  an  RP  strategy. 

A.  FIRST  STAGE:  RANDOM  BALANCE  DESIGN 

In  matrix  terms  we  can  write  model  (1.1)  as  %_m  BqI  +  Xfl  +  e  where  1^  is 
an  N  x  1  vector  of  +l's,  ^  is  an  N  x  1  vector  of  responses,  £  is  an  N  x  1 
vector  of  error  terms,  JJ  is  an  K  x  1  vector  of  factor  effects,  and  X  is  an 
N  x  K  design  matrix.  In  a  random  balance  experiment,  X * [x^,  •••*  Xg] 

is  a  stochastic  matrix  whose  J—  column,  x. ,  is  an  N  x  1  vector  consisting 
of  a  random  arrangement  of  N/2  +l’a  and  N/2  -l's.  The  K  column  vectors  of 
X  are  stochastically  Independent. 

The  simple  least  squares  estimator  of  Bj(J>,1)  is  given  by 

Bj  -  (y+j-y.j572  (3.1) 

where  y+j (y_j )  is  the  average  value  of  the  response  over  the  N/2  runs  at 
the  +1  (-1)  level  of  the  J— -  factor.  (By  simple  least  squares  we  mean  that 

each  Bj  is  estimated  Ignoring  all  other  factors.)  In  matrix  terms 
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Thus, 


and 


-  (Xj  *£)  /N  ■  f  (E  +  Xj  £]  /N 

A  K 

E(BJ-(1/N)[  E  e,E(x/x.)+E(x,'e)], 


J 


i-1 


i-j  — i  -j  - 


2  K  2 
r  r  n*1 


v(B.j)  -  (l/N  ) v(xj ,x1) +  V(Xj *e) ] , 


(3.2) 

(3.3) 

(3. A) 


where  in  (3. A)  we  make  use  of  the  fact  that  x^ '£,  x^'x^,  x^'xg*  •••» 

Xj 'x^  are  mutually  independent  for  f ixed  j . 

A 

It  is  clear  that  the  exact  sampling  distribution  of  Bj  is  intractable. 
In  the  Appendix,  however,  we  show  that 


E<W 

(3.5) 

v(g>-  (T2-e2)/(n-D+o2/N, 

(3.6) 

and 

cov(61,  Bj)  -  BiBj/(N-  1)  , 

(3.7) 

2  K  2 

where  x  » 1  0  .  The  simple  least  squares  estimator 

m»l  m 

defined  in  (3.1)  is  therefore  an  unbiased  estimator  of  6  ,  although  its 

variance 

can  be  seriously  inflated.  Furthermore,  the 

correlation  between 

A  A 

8a  and  Bj 

is  roughly  (replacing  N  -  1  in  (3.6)  with  N) 

corr(B1,  Bj)  *•  B^/V^ 

(3.8) 

where 

V  -  [t2  +  o2-B2]  *  . 
m  m 

The  correlation  in  (3.8)  is  a  measure  of  the  confounding  between  B^ 

A 

and  B j .  Notice  that  increasing  N  cannot  decrease  the  confounding  in  an  RB 
design  where  simple  least  squares  is  used.  Moreover,  the  degree  of  confound¬ 
ing  between  0^  and  0^  is  dependent  upon  the  magnitudes  of  other  effects  in 
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the  model 


Because  each  factor  is  at  two  levels,  the  standard  F-test  to  test 
versus  is  equivalent  to  a  simple  two-sample  t-test  be¬ 

tween  the  high  (+1)  and  the  low  (-1)  levels  of  the  j—  factor.  The  associ¬ 
ated  test  statistic  t^  is  given  by 

-gj/[SSEj/N(N-2)]ls  (3.9) 

where  SSE^  is  the  familiar  analysis  of  variance  notation  for  the  error 
sum  of  squares  of  factor  j.  Computationally, 

ssEj *  E^i_y+j^2+  (3.io) 

H  L 

where  the  first  (second)  summation  is  taken  over  the  N/2  observations  at 
the  high  (low)  level  of  the  j—  factor.  Alternative  computational  formulas 
are 


N  ^  2 

SSE  «  Ey^-Ng^-Ny 
J  i»l  1  J 


«£<y, -y)  -NUT 
i-i  1  J 


(3.11) 

(3.12) 

(3.13) 


where  y  is  the  overall  mean  of  the  responses. 

We  reject  in  favor  of  if  the  observed  value  of  |tj|  equals  or  ex¬ 
ceeds  t(N-2;a^/2),  the  upper  100(1-0^/2)  percentage  point  of  "Student's" 
t-distribution  having  (N  -  2)  degrees  of  freedom.  We  should  point  out,  how¬ 
ever,  that  t^  does  not  truly  follow  a  t-dlstribution  since  the  assumptions 
of  so-called  normal  theory  are  not  met  exactly.  As  a  result,  the  true  size 
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of  the  test  may  differ  from  a^.  It  is  well  known,  though,  that  a  two- 
sample  t-test  is  a  rather  robust  testing  procedure.  Such  optimism  may 
even  suggest  that  normal  theory  can  provide  an  adequate  approximation  to 
the  distribution  of  t^  under  H^. 

In  a  previous  Desmatlcs,  Inc.  technical  report  [5],  we  studied  the 
distribution  of  t^  when  all  nonzero  factor  effects  were  of  equal  magni¬ 
tude.  We  found  that  even  for  relatively  small  values  of  N,  normal  theory 
provided  approximations  that  agreed  very  closely  with  corresponding  Monte 
Carlo  estimates.  Preliminary  indication  is  that  the  normal-theory  approxi¬ 
mations  also  perform  well  for  an  arbitrary  set  of  effects.  The  approxi¬ 
mation  states  that  for  fixed  Nf  t^  has  approximately  a  noncentral  t-dis- 
tributlon  with  (N  -  2)  degrees  of  freedom  and  noncentrality  parameter 


-  Bj2  +  a2]*s 


(3.14) 


To  apply  this  approximation  we  define 


4^(6)  -  P{  |Tn_2(6)  |  >  t(N-2;ai/2)  }  , 


(3.15) 


where  T^(6)  denotes  a  random  variable  having  a  noncentral  t-distribution 
with  y  degrees  of  freedom  and  noncentrality  parameter  6,  and  we  let 


fl.  ifH0  =  8j. 

(o,  otherwise. 


0  is  rejected  in  the  first  stage 


We  see  immediately  that  S  ■  ER. .  where,  as  defined  previously,  S  denotes 

J-l  3 

the  number  of  factors  carried  over  to  the  PB  second  stage  from  the  RB  first 

stage . 


From  our  definition  of  R^,  P(R^  ■  1)  represents  the  power  of  the  first- 
stage  F-test  for  detecting  the  effect  of  the  J—  factor.  The  above  approxi- 


1 


nation  to  the  distribution  of  has 


PO^-l)**^^)  . 


(3.16) 


Notice  that  if  0^  -0,  then  6^  *0  and  P(R^  - 1)  *  ^(0)  -a^ 
We  note  further  that  E(R^j)  ■  P(R^j  ■  1) ,  so  that 


K 

E(S)  •  I  E(R„) 

J-l  3 

»*  W 

j-i  j 


(3.17) 


Introducing  (3.17)  into  (2.3)  yields 


K 

E(R) -N  +  3.5  +  E  <M6.  )  .  (3.18) 

j-l  1  1J 

Thus,  the  expected  total  number  of  runs  for  an  RP(N,  a^,  a^)  strategy  is 
given  approximately  by  (3.18). 

B.  SECOND  STAGE:  PLACKETT  -  BURMAN  DESIGN 


Unlike  RB  experimentation,  the  testing  characteristics  of  PB  designs 
are  fairly  well  known.  For  brevity,  therefore,  we  do  not  rederive  these 
characteristics  here  but  instead  apply  and  state  them  where  needed. 

We  consider,  then,  a  Plackett-Burman  design  for  the  study  of  S  factors 
in  M-B(S-fl)  runs,  and  we  again  consider  testing  H^:8j-0  versus  H^:8j  J*0, 
assuming  of  course  that  factor  j  is  one  of  the  S  factors  being  analyzed,  l.e., 
R^j  -  1  .  We  define 

if  Hq:6j-0  Is  rejected  in  the  second  stage 
0  otherwise , 
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and  let 


ip2(s,6>  -  [Td^8j  <6>  |  ^t(d(s);  a2/2 )} 


(3.19) 


Where  d(s)  "M-  (s  + 1)  ■  B(s  + 1)  -  (s  +  1)  .  The  quantity  d(s)  represents  the 
nunfter  of  error  degrees  of  freedom  in  the  PB  design  given  that  S  -  s  factors 
are  carried  over  from  the  RB  first  stage.  We  note  that  if  ■  0 ,  then 

hi  *°  • 


Conditional  on  Sas  and  on  R  .  -  1, 

u 

P(R2j-l  |s-s,  R^-l)- 
where  6^-  [B(s  + 1)  ]**  B^/a  . 


we  can  show 
■  ^(s*  fi2j* 


(3.20) 

(3.21) 


The  conditional  probability  in  (3.20)  represents  the  probability  that  the 
j—  factor  tests  significant  from  zero  in  the  second  stage  given  that  it 
and  8-1  other  factors  test  significant  from  zero  in  the  RB  first  stage. 

This  is  basically  the  only  result  we  need  for  the  second  stage.  We  re¬ 
emphasize  that  S  is  a  random  variable.  In  subsection  C  we  discuss  an  approxi¬ 
mation  to  R(&2j  “  1) *  the  probability  that  the  J—  factor  is  declared  impor¬ 
tant  (i.e.,  Hq : gj  -  0  is  rejected  in  both  stages)  by  an  RP(N,a^,a2)  strategy. 


C.  COMBINED  STAGES:  THE  RP(N,  C^,  ctj)  STRATEGY 


To  evaluate  P (R^  -  1) ,  we  observe  first  that 

P(R2j  -1)  -P(R  -1)  P(R2j-l|Rlj-l)  .  (3.22) 

Since  an  approximation  to  P(R^  ■  1)  is  given  in  (3.16),  it  remains  to  con¬ 
sider  the  conditional  probability  that  R^  »  1  given  Rj.»l. 
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Now 


We  define  Sj  -  S  -  R^  . 

K 

P(R2j  -  llR^  -  1)  -  £P(R2j  -  1.  Sj  ■  8  -  l|RXj  -  1) 

K 

-^(Sj  -s-l|R1;J  -1)  P(R2j  “  l|Sj  ■  8  -  lt  R^-D 
K 

-EP(S  -s-l|R  -1)  Ms, 6..).  (3.23) 

s-l  3  AJ  * 

To  evaluate  (3.23)  we  must  somehow  approximate  the  conditional  dis¬ 
tribution  of  Sj  given  R_jj  - 1.  We  lament  that  this  Is  not  a  straightforward 
exercise;  the  conditional  distribution  of  given  R^  -  1  is  extremely  com¬ 
plex.  Nevertheless,  in  many  screening  situations,  for  moderately  sized  N 
the  conditional  distribution  of  given  - 1  might  be  reasonably  approxi¬ 
mated  as  the  convolution  of  (K  -  1)  independent  Bernoulli  random  variables 
having  success  probabilities  :  «“1»  2,  ...,  j  -  1,  j  +  1 . K)  . 

Alternatively,  following  Feller  [3],  for  large  N  and  moderate  values  of 
K 

X  -  E  <|^.(6,  )  -  E(S)  -  (6, .) ,  we  might  reasonably  approximate  the  condi- 

J  m-1  1  10  1  1J 

mj*j 

tlonal  distribution  of  given  R^  -  1  by  a  Poisson  distribution  having 
mean  X^.  The  Poisson  approximation  is  generally  much  easier  to  apply  than 
the  Bernoulli  convolution  approximation,  particularly  when  K  is  large. 

We  note  that  the  means  of  both  approximating  distributions  agree  with 
the  conditional  mean  of  given  R^  -  1  .  The  same  cannot  be  said  of  their 
variances  however.  If  we  define  P^  «P{Rj^-l,  Rjj-l}  f°r  any  and  j, 
then 

V(S.|R..-1)-  E  P..(l-P..)+  E  Z  (P  -P  P  )  .  (3.24) 

J  U  ijtj  “  11  r*j  rr  rr  «» 

ryhn 
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The  variance  of  the  Bernoulli  convolution  approximation  Is  given  by 


E  P..  (1 -P. .)  ;  the  variance  of  the  Poisson  approximation  is  given  by 

1*J  11  11 

X  ■  E  P .  .  Perhaps  we  could  obtain  a  more  refined  approximating  dis- 

3  1*J  11 

tribution  if  we  could  equate  not  only  the  first  order  but  also  the  second 

order  moments.  Unfortunately,  we  have  not  yet  been  able  to  reasonably 

approximate  P  ,  r  i  m  . 
rm 

D.  MONTE  CARLO  RESULTS 

As  a  check  on  the  various  approximations  presented,  we  conducted  three 
Monte  Carlo  case  studies.  The  results  are  summarized  in  Tables  1,  2,  and  3. 

As  can  be  seen  from  these  tables,  the  results  are  extremely  encouraging  and 
suggest  that  the  approximations  of  E(R),  P(R^«1),  and  P(Ry  *1),  presented 
herein,  are  quite  reasonable. 

As  can  be  noted  from  the  tables,  the  approximations  of  P(R^  “  1)  based 
on  the  Bernoulli  convolution  and  Poisson  distribution  approaches  yield  es¬ 
sentially  the  same  results  in  the  first  two  case  studies.  Because  of  this 
agreement  and  the  complexity  of  the  calculations  associated  with  the  Bernoulli 
convolution  for  the  third  case  study,  we  used  only  the  Poisson  approximation 
in  that  case. 

Of  the  three  Monte  Carlo  case  studies,  most  noteworthy  is  the  third 
case  study,  which  we  feel  is  more  akin  to  situations  encountered  in  practice 
than  the  first  and  second  case  studies.  In  this  case,  nonzero  factor  effects 
vary  in  magnitude,  ranging  between  o  and  6o  in  absolute  magltude.  Moreover, 
the  majority  of  the  effects  are  relatively  small.  Even  here  the  approxima¬ 
tion  e  do  quite  well. 
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Table  1:  Results  of  Case  Study  1.  (Value  In  parentheses  represents  estimated  standard 
deviation  of  Monte  Carlo  estimate.) 


E(R) 


Table  2:  Results  of  Case  Study  II •  (Value  in  parentheses  represents  estimated  standard 
deviation  of  Monte  Carlo  estimate.) 
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Table  3:  Results  of  Case  Study  III.  (Value  In  parentheses  represents  estimated  standard 
deviation  of  Monte  Carlo  estimate.) 


IV.  PRACTICAL  CONSIDERATIONS 


Although  analyzing  each  factor  separately  in  an  RB  design  via  an  F- 
test  Is  a  relatively  quick  and  simple  testing  procedure.  It  Is  not  neces¬ 
sarily  the  most  powerful  method  of  analyzing  data  from  a  random  balance 
design.  Presumably,  more  sophisticated  statistical  techniques  (such  as 
least  squares  stepwise  methods)  that  analyze  more  than  one  factor  at  a  time 
would  provide  greater  power.  Such  methods,  however,  may  not  be  computation¬ 
ally  feasible  If  K  Is  extremely  large  and  may  be  severely  limited  if  N  is 
very  small  relative  to  K.  In  addition,  evaluating  the  efficacy  of  many  of 
these  methods  will  be  a  difficult,  if  not  insurmountable,  task.  The  in¬ 
dividual  F-test  approach,  as  we  have  seen,  admits  to  a  tractable  assessment 
that  may  provide  a  lower  bound  for  the  performance  characteristics  of  these 
alternative,  more  sophisticated  analysis  techniques. 

The  power  of  the  K  separate  F-tests  can,  of  course,  be  increased  di¬ 
rectly  by  using  a  larger  sample  size  N  or  a  larger  level  of  significance  a^. 
However,  by  increasing  N  we  increase  our  testing  cost  by  requiring  more 
first-stage  runs,  and  by  increasing  ot^  we  increase  not  only  the  probability 
of  first-stage  Type  I  error  (l.e.,  rejecting  Hq  when  it  is  true),  but  also 
the  expected  number  of  factors  carried  over  to  the  second  stage.  Increasing 
a^,  then,  will  also  increase  our  expected  testing  cost  by  requiring,  on  the 
average,  more  second-stage  runs. 

Similarly,  we  can  increase  the  power  of  the  second-stage  analysis  by 
using  a  larger  PB  design  or  a  larger  level  of  significance  02*  Unlike  ad¬ 
justing  0^ ,  however,  adjusting  does  not  affect  the  expected  total  number 
of  runs  required  by  an  RP  strategy.  The  power  of  the  second-stage  analysis 
might  also  be  increased  if  soma  factor  effects  can  be  reasonably  assumed  to 


be  negligible;  in  Which  case  we  can  pool  their  associated  sum  of  squares 
into  the  error  sua  of  squares  to  obtain  a  pooled  error  estimate  having 
nore  degrees  of  freedom  for  error  than  the  unpooled  estimate.  This  is 
particularly  tempting  when  d(s)*l  where  d(s)  is  defined  as  in  (3.19). 

Caution,  however,  must  be  exercised  whenever  effects  are  pooled  since 
pooling  has  a  tendency  to  diminish  the  denominator  expected  mean  square  of 
the  F-statlstic.  Nonetheless,  because  of  the  relatively  large  gain  in 
power  that  accrues  for  each  additional  degree  of  freedom  (up  to  a  total 
of  about  5),  pooling  may  be  very  attractive. 

To  help  determine  which  effects,  if  any,  might  be  reasonable  to  com¬ 
bine  in  the  second-atage  analysis,  estimated  effects  can  be  plotted  on 
normal  probability  paper.  In  this  technique,  due  to  Daniel  [2],  small  ef¬ 
fects  should  fall  approximately  along  a  straight  line,  while  large  effects 
should  tend  to  fall  far  from  the  line.  This  technique  by  Itself  can  also 
serve  as  an  alternative  method  of  analysis,  although  it  generally  relies 
heavily  on  subjective  judgement. 

Further,  in  our  evaluation  of  performance  in  Section  111,  we  assumed 
for  simplicity  that  all  tests  of  significance  in  the  first  and  second  stages 
are  conducted  at  the  same  levels,  namely,  and  otj.  respectively.  In  prac¬ 
tice,  however,  differing  levels  of  significance  may  be  used.  Larger  signifi¬ 
cance  levels  could  conceivably  be  used  when  testing  factors  anticipated  prior 
to  experimentation  to  have  a  major  effect  on  the  response.  This  would  af¬ 
ford  us  greater  flexibility  in  regulating  the  effect  of  Type  1  and  Type  II 
errors.  We  note  that  the  results  of  Section  III  can  be  easily  modified  to 
allow  for  distinct  levels  of  significance  in  the  first  or  second  stage.  Using 
different  significant  levels  for  different  factors  implies  the  incorporation 
of  prior  knowledge  into  a  screening  strategy.  Thus  a  Bayesian  framework 
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would  be  called  for.  However,  we  will  not  venture  Into  Bayesian  ter¬ 
ritory  In  this  report. 

Finally,  we  note  that  the  RB  first  stage  can  be  used  alone  as  a 
method  of  factor  screening.  An  immediate  advantage  of  a  screening  plan 
based  solely  on  an  RB  design  is  that  the  total  number  of  runs  used  for 
screening  can  be  fixed  prior  to  experimentation.  A  quantitative  evaluation 
of  the  advantages  and  disadvantages  of  using  a  one-stage  versus  a  two- 
stage  procedure  will  be  addressed  in  a  forthcoming  report. 
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V.  SUMMARY 


In  this  report  we  have  endeavored  to  evaluate  the  efficacy  of  the  RP 
screening  method.  Of  course,  in  order  to  select  an  RP  strategy,  we  must 
consider  both  the  accuracy  of  factor  classification  and  the  number  of  runs 
required.  Because  of  the  intractable  nature  of  an  exact  solution,  we 
have  developed  approximations  to  (1)  the  probablltiy  that  a  given  factor 
is  declared  Important  and  (2)  the  expected  number  of  required  runs.  When 
compared  with  the  results  of  three  Monte  Carlo  studies,  these  approxima¬ 
tions  fared  extremely  well. 

The  results  of  this  report  can  be  used  as  a  practical  guide  in  de¬ 
cisions  about  the  possible  use  and  choice  of  an  RP  strategy.  In  many  ways 
the  actual  selection  of  an  RP  strategy  is  similar  to  the  process  of  speci¬ 
fying  the  sample  size  for  an  analysis  of  variance  problem.  That  is,  we 
must  consider  trade-offs  between  Type  I  error,  power,  and  total  number  of 
runs  required. 
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APPENDIX 


P(H-h) 


If  H  is  a  discrete  random  variable  having  probability  distribution 
C  *\  2/\ (2r\ 

hjy/  \  r/  h  ■  0,  1,  2,  r  (A.l) 

0  otherwise, 

where  r  is  any  positive  integer,  we  write  H~H(r).  The  class  of  distri¬ 
butions  defined  in  (A.l)  is  a  symmetric  subfamily  of  the  hypergeometric 
family  of  distributions.  If  H~H(r),  then  E(H) = r/2  and  V(H)  -  (r/2)^/(2r-l) . 

We  define  f.  •  (x^  + 1_)  /2  where  is  the  i—  column  vector  of  an  RB  de¬ 
sign  matrix  and  note  that,  for  ii*j,  -  H(N/2) .  It  follows  that,  for 

i  j*  j  ,  (Xj 'x^  + N)/4  -  H(N/2)  .  Hence,  for  i^j,  E(Xj'x^)“0  and 
V(x^  -  N2/(N  -  1)  • 

In  regard  to  the  distribution  of  x^’£,  we  observe  that  x^'ejxj- 
2 

N(0,No  )  since  x j  j  *  N •  Because  the  conditional  distribution  of  x^' e. 

given  Xj  is  the  same  for  any  realization  of  x^,  the  result  is  therefore 

2 

true  unconditionally.  Thus,  E(Xj*£)-0  and  V  (x^  '£)  *N<J  . 

To  find  the  covariance  between  0^  and  0^ ,  for  i  i  j ,  we  have 


cov^.g^-E^g^-e^  . 

K 

We  write  N0  ■  y  +  X  where  X  «x  'e  and  y  m  £0  x’  x  .  Thus, 
tn  tn  m  n  •tn  ~  in  in  (] 


(A. 2) 


E(yiYj)+E(YiAj)+E(YjX1)  +£(1^) 


(A.  3) 


It  is  easy  to  show  that  E(YjXj)  ■E(Y,Xi)  -E^X^)  »0,  so  that 


cov^.fy-  E(YiYj)/N:  -6^  - 
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(A.4) 


Expanding  E(YjYj)»  we  have 


J 


•  <A‘5) 

It  is  not  difficult  to  verify  that,  for  if«j,  ECx^x^  ’x^)  is  zero  unless 

2  2 

r-i,  q-j  or  r-j,  q-i.  When  r-i  and  q-j,  Epc^’x^Xj  *x^)  -  E[  (xjjc^)  ]-N  ; 
when  r-j  and  q-i,  Ept^'x^Xj ’x^)  ■  V(Xj 'x^)  -  N2/ (N  -  1)  . 

Substitution  into  (A. 5)  yields 

ECY^j)  -N^Bj/CN-I).  (A. 6) 

Finally,  introducing  (A.6)  into  (A.4),  we  get  the  desired  result 

cov^.Sj)  -^/(N-l)  .  (A.7) 


I 


1 
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can  then  be  focused  on  these  key  factors.  In  this  report  we  evaluate  the 
performance  of  a  two-stage  screening  strategy  that  is  based  on  a  combination 
of  random  balance  and  Plackett-Burman  designs. 
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